Using Performance Monitor to Optimize System Performance

ABSTRACT

An approach that optimizes system performance using performance monitors is presented. The system gathers thread performance data using performance monitors for threads running on either a first ISA processor or a second ISA processor. Multiple first processors and multiple second processors may be included in a single computer system. The first processors and second processors can each access data stored in a common shared memory. The gathered thread performance data is analyzed to determine whether the corresponding thread needs additional CPU time in order to optimize system performance. If additional CPU time is needed, the amount of CPU time that the thread receives is altered (increased) so that the thread receives the additional time when it is scheduled by the scheduler. In one embodiment, the increased CPU time is accomplished by altering a priority value that corresponds to the thread.

RELATED APPLICATIONS

This application is a continuation application of co-pending U.S.Non-Provisional patent application Ser. No. 11/425,448, entitled “Systemand Method for Using Performance Monitor to Optimize SystemPerformance,” filed on Jun. 21, 2006.

BACKGROUND OF THE INVENTION

1. Technical Field

The present invention relates in general to a system and method foroptimizing system performance using a performance monitor. Moreparticularly, the present invention relates to a system and method thatmonitors threads in a plurality of dissimilar processors and optimizesCPU time among the processors based on analyzing data gathered for thevarious threads.

2. Description of the Related Art

Computing systems that use a combination of heterogeneous processors arebecoming increasingly popular.

In these environments, one or more general purpose processors work inconjunction with one or more special purpose processors. Being differentprocessor types, the general purpose processors use a differentinstruction set architecture (ISA) than the ISA used by the specialpurpose processors. Having different processing characteristics and ISAslends each processor type to efficiently performing different types oftasks.

Because of the different characteristics of the processors, thisheterogeneous environment is attractive to a variety of applications,such as multimedia, gaming, and numeric intensive applications. In thisenvironment, a program can have multiple threads. Some of these threadscan execute on the general purpose processors and other threads canexecute on the special purpose processors. A challenge, however, is thatresource availability is not often known until an application isrunning. A challenge, therefore, is predetermining the amount of CPUtime that should be allocated to the various threads. This challenge isexacerbated in a heterogeneous processing environment where one type ofCPU (based on a first ISA) may be constrained, while another type of CPU(based on a second ISA) may not be constrained.

What is needed, therefore, is a system and method that monitors threadperformance in a heterogeneous processing environment. What is furtherneeded is a system and method that dynamically alters the amount of CPUtime that threads received based upon an analysis of the threadperformance data.

SUMMARY

It has been discovered that the aforementioned challenges are resolvedusing a system and method that gathers thread performance data using aperformance monitor. The threads may be running on either a firstprocessor that is based on a first instruction set architecture (ISA),or a second processor that is based on a second ISA. Multiple firstprocessors and multiple second processors may be included in a singlecomputer system. The first processors and second processors can eachaccess data stored in a common shared memory. The gathered threadperformance data is analyzed to determine whether the correspondingthread needs additional CPU time in order to optimize systemperformance. If additional CPU time is needed, the amount of CPU timethat the thread receives is altered (increased) so that the threadreceives the additional time when it is scheduled by the scheduler. Inone embodiment, the increased CPU time is accomplished by altering apriority value that corresponds to the thread.

In another embodiment, a user can configure the system by choosingperformance selections that are stored and used by the performancemonitor when gathering data. The user can also select which processorsmonitor thread performance. In this manner, if one processor isdedicated to a particular task and does not swap out for differentthreads, then there is little need to monitoring the dedicated thread(s)running on the processor.

In another embodiment, a common scheduler is used to schedule threads toboth the first processors and the second processors. In this embodiment,the thread performance data is stored in the shared memory. Thescheduler determines whether a particular processor is running below apredefined CPU utilization. If the processor is running below thepredefined utilization, then the CPU time that the threads receive forthe processor are adjusted as described above. However, if the processoris running at an acceptable utilization level, then the CPU time thatthe threads receive is not adjusted.

The foregoing is a summary and thus contains, by necessity,simplifications, generalizations, and omissions of detail; consequently,those skilled in the art will appreciate that the summary isillustrative only and is not intended to be in any way limiting. Otheraspects, inventive features, and advantages of the present invention, asdefined solely by the claims, will become apparent in the non-limitingdetailed description set forth below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present invention may be better understood, and its numerousobjects, features, and advantages made apparent to those skilled in theart by referencing the accompanying drawings.

FIG. 1 is a diagram showing performance monitors in a heterogeneousprocessing environment gathering thread performance data that is used byschedulers to allocate CPU time;

FIG. 2 is a high-level flowchart showing the steps taken to useperformance monitors to gather thread data in a heterogeneous processingenvironment;

FIG. 3 is a flowchart showing steps taken by a performance monitor togather thread event data for a first CPU that is based on a firstinstruction set architecture (ISA);

FIG. 4 is a flowchart showing steps taken by a performance monitor togather thread event data for a one or more second CPUs that are eachbased on a second ISA;

FIG. 5 is a flowchart showing the steps taken by a scheduler to allocateCPU time based on gathered thread event data;

FIG. 6 is a block diagram of a traditional information handling systemin which the present invention can be implemented; and

FIG. 7 is a block diagram of a broadband engine that includes aplurality of heterogeneous processors in which the present invention canbe implemented.

DETAILED DESCRIPTION

The following is intended to provide a detailed description of anexample of the invention and should not be taken to be limiting of theinvention itself. Rather, any number of variations may fall within thescope of the invention, which is defined in the claims following thedescription.

FIG. 1 is a diagram showing performance monitors in a heterogeneousprocessing environment gathering thread performance data that is used byschedulers to allocate CPU time. In the example shown, two heterogeneousprocessor types are being used with each processor type based upon adifferent instruction set architecture (ISA). Processes that are beingexecuted by processors based upon a first ISA are enclosed in box 130,while processes that are being executed by processors based upon asecond ISA are enclosed in box 160. Processes being run by both ISAsinclude performance monitors and various threads. Performance monitor150 monitors thread events occurring in the first ISA, while performancemonitor 180 monitors thread events occurring in the second ISA. Threads140 represents various threads that are being executed by processorsbased upon the first ISA, while threads 170 represents various threadsthat are being executed by processors based upon the second ISA.Processors of both ISAs are able to access data stored in shared memory100. As explained in further detail in FIG. 7, in one embodiment,processors based on the first ISA are Primary Processing Elements(PPEs), while processors based on the second ISA are SynergisticProcessing Elements (SPEs). In this embodiment, a broadband engine busis used to facilitate access of the shared memory by the variousprocessors.

In the embodiment shown in FIG. 1, thread event data is stored in sharedmemory 100. Thread event data for threads running on a first ISAprocessor (e.g., on one of the PPEs) are stored in memory area 110,while thread event data for threads running on a second ISA processor(e.g., on one of the SPEs) are stored in memory area 120. Scheduler 190reads the thread event data and allocates CPU time accordingly.Scheduled threads are dispatched to either one of the processors basedon the first ISA (processors 192) or to one of the processors based onthe second ISA (processors 194). In one embodiment, a common schedulerschedules threads for both types of processors (processors 192 and 194).This embodiment facilitates scheduling of “assist” threads running onone of the SPEs at the same time the main thread is scheduled to run onone of the PPEs. Of course, those of skill in the art will appreciatethat separate schedulers could be used so that one scheduler schedulesthreads to run on one type of processor, such as the PPEs, while anotherscheduler schedules threads to run on another type of processor, such asthe SPEs.

FIG. 2 is a high-level flowchart showing the steps taken to useperformance monitors to gather thread data in a heterogeneous processingenvironment. First, the small flowchart across the top commencing at 200shows a user choosing performance selections which, at step 210, arereceived and stored in performance configuration file 220. In addition,the user can select which processors should monitor performance ofthreads running on the processor. For example, a particular process orthread can be dedicated to a given processor, such as one of the SPEs.As a dedicated process, the process is not swapped in and out, thereforemonitoring its performance to increase its CPU time would not be neededsince the process is already dedicated to a processor. Moreover, theuser can decide to only monitor threads running on a particularprocessor type, such as monitor threads running on the PPE and not thoserunning on the SPEs, or vise versa. Finally, the user can also setthresholds on the various processors so that the CPU time alterationsdescribed herein are only performed when a processor's utilization isbelow the user-defined threshold. In this manner, the user can selectthe thresholds and events that trigger additional CPU time for threadsas well as the processors where thread events are gathered by theperformance monitors. Additionally, default configuration settings canbe established setting default events to monitor as well as defaultprocessors and threshold values. When default settings are used, themechanism shown in the small flowchart can then be used to alter thesedefault settings. The small flowchart thereafter ends at 215.

Performance monitor processing is shown in the larger flowchart andcommences at 225 whereupon, at step 230, the performance selectionsstored in performance configuration file 220 are checked. Adetermination is made as to whether thread events running in onprocessors based on the first ISA (e.g., the PPE) are being monitored(decision 240). If thread events running in on processors based on thefirst ISA are being monitored, decision 240 branches to “yes” branch 245whereupon, at step 250, the selections from the performanceconfiguration file are read indicating the type of events to gather forthe threads and, at predefined process 260, the performance monitor thatgathers thread event data for thread running on first ISA processors isinitiated (see FIG. 3 and corresponding text for processing details). Onthe other hand, if thread events running in on processors based on thefirst ISA are not being monitored, decision 240 branches to “no” branch265 bypassing steps 250 and 260.

A determination is made as to whether thread events running in onprocessors based on the second ISA (e.g., the SPEs) are being monitored(decision 270). If thread events running in on processors based on thesecond ISA are being monitored, decision 270 branches to “yes” branch275 whereupon, at step 280, the selections from the performanceconfiguration file are read indicating the type of events to gather forthe threads and, at predefined process 285, the performance monitor thatgathers thread event data for thread running on second ISA processors isinitiated (see FIG. 4 and corresponding text for processing details). Onthe other hand, if thread events running in on processors based on thesecond ISA are not being monitored, decision 270 branches to “no” branch290 bypassing steps 280 and 285. Processing thereafter ends at 295.

FIG. 3 is a flowchart showing steps taken by a performance monitor togather thread event data for a first CPU that is based on a firstinstruction set architecture (ISA). The performance monitor described inFIG. 3 is used when only one processor of a particular type is beingused. In one embodiment, the processor element includes a single primaryprocessing element (PPE) processor and multiple synergistic processingelements (SPEs). This embodiment is described in more detail in FIG. 7.In an environment with a single PPE, the steps shown in FIG. 3 can beused to monitor the threads running on the processor. FIG. 4, on theother hand, is used to monitor performance of threads when multipleprocessors of a particular type are present in the processor element.

Returning to FIG. 3, processing commences at 300 whereupon, at step 310,settings for the processor type that is being monitored are retrievedfrom performance configuration file 220. At step 320, event tracking isturned on for the events specified in the performance configurationfile. At step 330, a thread that is currently running on the processorcompletes or is timed out. At step 340, the performance monitor gathersevent data that was accumulated during execution of the thread that justcompleted. At step 350, this event data is stored in memory area 110within shared memory 100. A determination is made as to whether to resetconfiguration settings (decision 360). For example, if the user editedthe performance configuration file (see FIG. 2, steps 200-215), then thesystem would reset the configuration settings. To reset configurationsettings, decision 360 branches to “yes” branch 365 which loops back toclear the configuration settings and retrieve the configuration settingsstored in the performance configuration file. On the other hand, ifconfiguration settings are not being reset, then decision 360 branchesto “no” branch 370 whereupon a determination is made as to whether tocontinue monitoring threads running on the processor (decision 375). Forexample, the user may turn performance monitoring off for this processoror the system may be shut down. If monitoring continues, decision 375branches to “yes” branch 380 which loops back to gather thread eventdata for the next thread that completes. This looping continues untilmonitoring is turned off or a system shutdown occurs, at which timedecision 375 branches to “no” branch 385 and performance monitoring endsat 395.

FIG. 4 is a flowchart showing steps taken by a performance monitor togather thread event data for a one or more second CPUs that are eachbased on a second ISA. The performance monitor described in FIG. 4 isused when only multiple processors of a particular type are being used.In one embodiment, the processor element includes multiple synergisticprocessing elements (SPEs). This embodiment is described in more detailin FIG. 7. In an environment with a multiple SPEs, the steps shown inFIG. 4 can be used to monitor the threads running on the processors.

Processing commences at 400 whereupon, at step 410, settings for theprocessor type that is being monitored are retrieved from performanceconfiguration file 220. At step 420, event tracking is turned on for theevents specified in the performance configuration file. At step 430, athread that is currently running on one of the processors completes oris timed out. A determination is made as to whether the processor wherethe thread was running is being monitored (decision 440). For example,the performance configuration file may indicate that one or moreprocessors (e.g., SPEs) are not being monitored. If the performancemonitor is monitoring the processor that was running the thread thatjust completed, decision 440 branches to “yes” branch 445 whereupon, atstep 450, the performance monitor gathers event data that wasaccumulated during execution of the thread that just completed. At step460, this event data is stored in memory area 120 within shared memory100. On the other hand, if the performance monitor is not monitoringthis SPE, decision 440 branches to “no” branch 465 bypassing steps 450and 460.

A determination is made as to whether to reset configuration settings(decision 470). For example, if the user edited the performanceconfiguration file (see FIG. 2, steps 200-215), then the system wouldreset the configuration settings. To reset configuration settings,decision 470 branches to “yes” branch 475 which loops back to clear theconfiguration settings and retrieve the configuration settings stored inthe performance configuration file. On the other hand, if configurationsettings are not being reset, then decision 470 branches to “no” branch478 whereupon a determination is made as to whether to continuemonitoring threads running on this type of processor (decision 480). Ifmonitoring continues, decision 480 branches to “yes” branch 485 whichloops back to gather thread event data for the next thread thatcompletes on one of the processors (so long as the processor is beingmonitored). This looping continues until the user turns off performancemonitoring or a system shutdown occurs, at which time decision 480branches to “no” branch 490 and performance monitoring ends at 495.

FIG. 5 is a flowchart showing the steps taken by a scheduler to allocateCPU time based on gathered thread event data. In the embodiment shown, asingle scheduler is used to schedule threads for both types ofprocessors (those based on the first ISA, e.g., an PPE, and those basedon the second ISA, e.g., an SPE). However, the scheduler shown caneasily be modified so that more than one scheduler are used to schedulethe threads to the various processor types.

Processing commences at 500 whereupon, at step 510, the schedulerretrieves CPU utilization thresholds from performance configuration file220. At step 520, the scheduler retrieves data regarding the next threadto be dispatched to one of the processors. At step 530, an ISA for thenext thread is identified along with a processor that is based upon theidentified ISA. For example, if the next thread runs on the first ISA,then a processor that is based on the first ISA (e.g., the PPE) isidentified. On the other hand, if the thread runs on the second ISA,then one of the processors that is based on the second ISA (e.g., one ofthe SPEs) is identified.

In the embodiment shown, a determination is made as to whether theidentified processor's utilization is below the threshold that was setfor the processor (decision 540). The thresholds for the variousprocessors was previously read in step 510. If the identifiedprocessor's utilization is below the threshold that was set for theprocessor, decision 540 branches to “yes” branch 545 whereupon, at step550, the performance data gathered by the performance monitor for thethread is retrieved (from either memory 110 or memory 120 depending onwhether it is a thread running on the first or second ISA) and theretrieved data is analyzed. At step 560, the amount of CPU time that thethread will receive is adjusted, if necessary, based on the analysis.Returning to decision 540, if the identified processor's utilization isnot below the threshold that was set for the processor, decision 540branches to “no” branch 565 bypassing steps 550 and 560. In an alternateembodiment, decision 540 is not performed so that steps 550 and 560 areperformed regardless of the processor's utilization.

At step 570, the thread is dispatched to the identified processor oncethe thread currently running on the identified processor ends or isswapped out. A determination is made as to whether to reset thethreshold values (decision 575). The thresholds would be reset if theuser edits performance configuration file 220 using steps 200 through215 shown in FIG. 2. If the threshold values are reset, decision 575branches to “yes” branch 580 which loops back to read in the newutilization thresholds at step 510. On the other hand, if theutilization threshold values are not reset, decision 575 branches to“no” branch 582.

Another determination is made as to whether to continue processing(decision 585). Processing continues while the system is running inorder to schedule threads for execution (i.e., processing continuesuntil the system is shutdown). If processing continues, decision 585branches to “yes” branch 588 which loops back to schedule and dispatchthe next thread for execution. This looping continues until the systemis shutdown, at which point decision 585 branches to “no” branch 590 andprocessing ends at 595.

FIG. 6 illustrates information handling system 601 which is a simplifiedexample of a computer system capable of performing the computingoperations described herein. Computer system 601 includes processor 600which is coupled to host bus 602. A level two (L2) cache memory 604 isalso coupled to host bus 602. Host-to-PCI bridge 606 is coupled to mainmemory 608, includes cache memory and main memory control functions, andprovides bus control to handle transfers among PCI bus 610, processor600, L2 cache 604, main memory 608, and host bus 602. Main memory 608 iscoupled to Host-to-PCI bridge 606 as well as host bus 602. Devices usedsolely by host processor(s) 600, such as LAN card 630, are coupled toPCI bus 610. Service Processor Interface and ISA Access Pass-through 612provides an interface between PCI bus 610 and PCI bus 614. In thismanner, PCI bus 614 is insulated from PCI bus 610. Devices, such asflash memory 618, are coupled to PCI bus 614. In one implementation,flash memory 618 includes BIOS code that incorporates the necessaryprocessor executable code for a variety of low-level system functionsand system boot functions.

PCI bus 614 provides an interface for a variety of devices that areshared by host processor(s) 600 and Service Processor 616 including, forexample, flash memory 618. PCI-to-ISA bridge 635 provides bus control tohandle transfers between PCI bus 614 and ISA bus 640, universal serialbus (USB) functionality 645, power management functionality 655, and caninclude other functional elements not shown, such as a real-time clock(RTC), DMA control, interrupt support, and system management bussupport. Nonvolatile RAM 620 is attached to ISA Bus 640. ServiceProcessor 616 includes JTAG and I2C busses 622 for communication withprocessor(s) 600 during initialization steps. JTAG/I2C busses 622 arealso coupled to L2 cache 604, Host-to-PCI bridge 606, and main memory608 providing a communications path between the processor, the ServiceProcessor, the L2 cache, the Host-to-PCI bridge, and the main memory.Service Processor 616 also has access to system power resources forpowering down information handling device 601.

Peripheral devices and input/output (I/O) devices can be attached tovarious interfaces (e.g., parallel interface 662, serial interface 664,keyboard interface 668, and mouse interface 670 coupled to ISA bus 640.Alternatively, many I/O devices can be accommodated by a super I/Ocontroller (not shown) attached to ISA bus 640.

In order to attach computer system 601 to another computer system tocopy files over a network, LAN card 630 is coupled to PCI bus 610.Similarly, to connect computer system 601 to an ISP to connect to theInternet using a telephone line connection, modem 675 is connected toserial port 664 and PCI-to-ISA Bridge 635.

While the computer system described in FIG. 6 is capable of executingthe processes described herein, this computer system is simply oneexample of a computer system. Those skilled in the art will appreciatethat many other computer system designs are capable of performing theprocesses described herein.

FIG. 7 is a block diagram illustrating a processing element having amain processor and a plurality of secondary processors sharing a systemmemory. FIG. 7 depicts a heterogeneous processing environment that canbe used to implement the present invention. Primary Processor Element(PPE) 705 includes processing unit (PU) 710, which, in one embodiment,acts as the main processor and runs an operating system. Processing unit710 may be, for example, a Power PC core executing a Linux operatingsystem. PPE 705 also includes a plurality of synergistic processingelements (SPEs) such as SPEs 745, 765, and 785. The SPEs includesynergistic processing units (SPUs) that act as secondary processingunits to PU 710, a memory storage unit, and local storage. For example,SPE 745 includes SPU 760, MMU 755, and local storage 759; SPE 765includes SPU 770, MMU 775, and local storage 779; and SPE 785 includesSPU 790, MMU 795, and local storage 799.

Each SPE may be configured to perform a different task, and accordingly,in one embodiment, each SPE may be accessed using different instructionsets. If PPE 705 is being used in a wireless communications system, forexample, each SPE may be responsible for separate processing tasks, suchas modulation, chip rate processing, encoding, network interfacing, etc.In another embodiment, the SPEs may have identical instruction sets andmay be used in parallel with each other to perform operations benefitingfrom parallel processing.

PPE 705 may also include level 2 cache, such as L2 cache 715, for theuse of PU 710. In addition, PPE 705 includes system memory 720, which isshared between PU 710 and the SPUs. System memory 720 may store, forexample, an image of the running operating system (which may include thekernel), device drivers, I/O configuration, etc., executingapplications, as well as other data. System memory 720 includes thelocal storage units of one or more of the SPEs, which are mapped to aregion of system memory 720. For example, local storage 759 may bemapped to mapped region 735, local storage 779 may be mapped to mappedregion 740, and local storage 799 may be mapped to mapped region 742. PU710 and the SPEs communicate with each other and system memory 720through bus 717 that is configured to pass data between these devices.

The MMUs are responsible for transferring data between an SPU's localstore and the system memory. In one embodiment, an MMU includes a directmemory access (DMA) controller configured to perform this function. PU710 may program the MMUs to control which memory regions are availableto each of the MMUs. By changing the mapping available to each of theMMUs, the PU may control which SPU has access to which region of systemmemory 720. In this manner, the PU may, for example, designate regionsof the system memory as private for the exclusive use of a particularSPU. In one embodiment, the SPUs' local stores may be accessed by PU 710as well as by the other SPUs using the memory map. In one embodiment, PU710 manages the memory map for the common system memory 720 for all theSPUs. The memory map table may include PU 710's L2 Cache 715, systemmemory 720, as well as the SPUs' shared local stores.

In one embodiment, the SPUs process data under the control of PU 710.The SPUs may be, for example, digital signal processing cores,microprocessor cores, micro controller cores, etc., or a combination ofthe above cores. Each one of the local stores is a storage areaassociated with a particular SPU. In one embodiment, each SPU canconfigure its local store as a private storage area, a shared storagearea, or an SPU may configure its local store as a partly private andpartly shared storage.

For example, if an SPU requires a substantial amount of local memory,the SPU may allocate 100% of its local store to private memoryaccessible only by that SPU. If, on the other hand, an SPU requires aminimal amount of local memory, the SPU may allocate 10% of its localstore to private memory and the remaining 90% to shared memory. Theshared memory is accessible by PU 710 and by the other SPUs. An SPU mayreserve part of its local store in order for the SPU to have fast,guaranteed memory access when performing tasks that require such fastaccess. The SPU may also reserve some of its local store as private whenprocessing sensitive data, as is the case, for example, when the SPU isperforming encryption/decryption.

One of the preferred implementations of the invention is a clientapplication, namely, a set of instructions (program code) or otherfunctional descriptive material in a code module that may, for example,be resident in the random access memory of the computer. Until requiredby the computer, the set of instructions may be stored in anothercomputer memory, for example, in a hard disk drive, or in a removablememory such as an optical disk (for eventual use in a CD ROM) or floppydisk (for eventual use in a floppy disk drive), or downloaded via theInternet or other computer network. Thus, the present invention may beimplemented as a computer program product for use in a computer. Inaddition, although the various methods described are convenientlyimplemented in a general purpose computer selectively activated orreconfigured by software, one of ordinary skill in the art would alsorecognize that such methods may be carried out in hardware, in firmware,or in more specialized apparatus constructed to perform the requiredmethod steps. Functional descriptive material is information thatimparts functionality to a machine. Functional descriptive materialincludes, but is not limited to, computer programs, instructions, rules,facts, definitions of computable functions, objects, and datastructures.

While particular embodiments of the present invention have been shownand described, it will be obvious to those skilled in the art that,based upon the teachings herein, that changes and modifications may bemade without departing from this invention and its broader aspects.Therefore, the appended claims are to encompass within their scope allsuch changes and modifications as are within the true spirit and scopeof this invention. Furthermore, it is to be understood that theinvention is solely defined by the appended claims. It will beunderstood by those with skill in the art that if a specific number ofan introduced claim element is intended, such intent will be explicitlyrecited in the claim, and in the absence of such recitation no suchlimitation is present. For non-limiting example, as an aid tounderstanding, the following appended claims contain usage of theintroductory phrases “at least one” and “one or more” to introduce claimelements. However, the use of such phrases should not be construed toimply that the introduction of a claim element by the indefinitearticles “aa” or “an” limits any particular claim containing suchintroduced claim element to inventions containing only one such element,even when the same claim includes the introductory phrases “one or more”or “at least one” and indefinite articles such as “a” or “an”; the sameholds true for the use in the claims of definite articles.

1. A computer-implemented method comprising: gathering threadperformance data corresponding to a first plurality of threads runningon one or more first processors that are based on a first instructionset architecture (ISA); gathering thread performance data correspondingto a second plurality of threads running on one or more secondprocessors that are based on a second ISA, wherein the first processorsand the second processors share a memory accessible from the first andsecond processors; analyzing the thread performance data gathered forthe first and second plurality of threads; and based on the analysis,adjusting an amount of CPU time allocated to at least one of the threadsincluded in the first and second plurality of threads.
 2. The method ofclaim 1 wherein adjusting the amount of CPU time includes modifying apriority value.
 3. The method of claim 1 further comprising: receivingperformance selections from a user; and storing the received performanceselections in a storage area, wherein the performance data gatheredcorresponds to the received performance selections.
 4. The method ofclaim 3 further comprising: selecting the one or more first processorsand the one or more second processors based upon the receivedperformance selections, wherein thread performance data is only gatheredfor threads running on the selected first and second processors.
 5. Themethod of claim 3 wherein the analyzing further comprises: comparing thegathered thread performance data to one or more thresholds included inthe received performance selections.
 6. The method of claim 1 furthercomprising: scheduling the first and second plurality of threads using acommon scheduler that reads the gathered thread performance data fromthe shared memory.
 7. The method of claim 1 wherein a common schedulerschedules the first and second plurality of threads, the method furthercomprising: storing the gathered thread performance data in the sharedmemory; retrieving CPU thread utilization thresholds corresponding to atleast one of the first processors and to at least one of the secondprocessors; and comparing the retrieved CPU thread utilizationthresholds with current CPU utilizations that correspond to theretrieved CPU thread utilization thresholds, wherein the analyzing andadjusting is only performed for those processors with current CPUutilizations that are below the retrieved CPU thread utilizationthresholds.
 8. An information handling system comprising: a plurality ofheterogeneous processors, wherein the plurality of heterogeneousprocessors includes one or more first processors that are based on afirst instruction set architecture (ISA) and a one or more secondprocessors that are based on a second instruction set architecture(ISA); a local memory corresponding to each of the plurality ofheterogeneous processors; a shared memory accessible by theheterogeneous processors; and a set of instructions stored in one of thelocal memories, wherein one or more of the heterogeneous processorsexecutes the set of instructions in order to perform actions of:gathering thread performance data corresponding to a first plurality ofthreads running on the first processors; gathering thread performancedata corresponding to a second plurality of threads running on thesecond processors; analyzing the thread performance data gathered forthe first and second plurality of threads; and based on the analysis,adjusting an amount of CPU time allocated to at least one of the threadsincluded in the first and second plurality of threads.
 9. Theinformation handling system of claim 8 further comprising instructionsthat perform the actions of: receiving performance selections from auser; and storing the received performance selections in a storage area,wherein the performance data gathered corresponds to the receivedperformance selections.
 10. The information handling system of claim 8further comprising instructions that perform the actions of: selectingthe one or more first processors and the one or more second processorsbased upon the received performance selections, wherein threadperformance data is only gathered for threads running on the selectedfirst and second processors.
 11. The information handling system ofclaim 8 wherein the analyzing further comprises instructions thatperform the actions of: comparing the gathered thread performance datato one or more thresholds included in the received performanceselections.
 12. The information handling system of claim 8 furthercomprising instructions that perform the actions of: scheduling thefirst and second plurality of threads using a common scheduler thatreads the gathered thread performance data from the shared memory. 13.The information handling system of claim 8 wherein a common schedulerschedules the first and second plurality of threads, the informationhandling system further comprising instructions that perform the actionsof: storing the gathered thread performance data in the shared memory;retrieving CPU thread utilization thresholds corresponding to at leastone of the first processors and to at least one of the secondprocessors; and comparing the retrieved CPU thread utilizationthresholds with current CPU utilizations that correspond to theretrieved CPU thread utilization thresholds, wherein the analyzing andadjusting is only performed for those processors with current CPUutilizations that are below the retrieved CPU thread utilizationthresholds.
 14. A computer program product stored in a computer readablemedium, comprising functional descriptive material that, when executedby a data processing system, causes the data processing system toperform actions that include: gathering thread performance datacorresponding to a first plurality of threads running on one or morefirst processors that are based on a first instruction set architecture(ISA); gathering thread performance data corresponding to a secondplurality of threads running on one or more second processors that arebased on a second ISA, wherein the first processors and the secondprocessors share a memory accessible from the first and secondprocessors; analyzing the thread performance data gathered for the firstand second plurality of threads; and based on the analysis, adjusting anamount of CPU time allocated to at least one of the threads included inthe first and second plurality of threads.
 15. The computer programproduct of claim 14 wherein adjusting the amount of CPU time includesmodifying a priority value.
 16. The computer program product of claim 14further comprising functional descriptive material that, when executedby the data processing system, causes the data processing system toperform actions that include: receiving performance selections from auser; and storing the received performance selections in a storage area,wherein the performance data gathered corresponds to the receivedperformance selections.
 17. The computer program product of claim 16further comprising functional descriptive material that, when executedby the data processing system, causes the data processing system toperform actions that include: selecting the one or more first processorsand the one or more second processors based upon the receivedperformance selections, wherein thread performance data is only gatheredfor threads running on the selected first and second processors.
 18. Thecomputer program product of claim 16 further comprising functionaldescriptive material that, when executed by the data processing system,causes the data processing system to perform actions that include:comparing the gathered thread performance data to one or more thresholdsincluded in the received performance selections.
 19. The computerprogram product of claim 14 further comprising functional descriptivematerial that, when executed by the data processing system, causes thedata processing system to perform actions that include: scheduling thefirst and second plurality of threads using a common scheduler thatreads the gathered thread performance data from the shared memory. 20.The computer program product of claim 14 further comprising functionaldescriptive material that, when executed by the data processing system,causes the data processing system to perform actions that include acommon scheduler that schedules the first and second plurality ofthreads based on actions that include: storing the gathered threadperformance data in the shared memory; retrieving CPU thread utilizationthresholds corresponding to at least one of the first processors and toat least one of the second processors; and comparing the retrieved CPUthread utilization thresholds with current CPU utilizations thatcorrespond to the retrieved CPU thread utilization thresholds, whereinthe analyzing and adjusting is only performed for those processors withcurrent CPU utilizations that are below the retrieved CPU threadutilization thresholds.